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ABSTRACT 

Linear threshold machines are defined to be those whose 
computations arc based on the outputs of a set of linear 
threshold, decision elements. The number of such elements 
is called the rank of the machine* An analysis of the 
computational geometry of finite-rank linear threshold 
machines, analogous to the analysis of finto -Order percept 
trons given by Minsky and Pupert, reveals that the use of 
such machines as "general purpose pattern recognition 
systems" is severely limited. For example, these machines 
cannot recogniice any topological invariant, nor can they 
recognize non—trivial figures "in context". 
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1. Introduction 

This paper is a contribution to M computational geometry 
In tbe spirit of the book Perceptrons by M, Minsky and 

h 

S. Fapcrt [lj. That is, we seek insights into the amount of 
computation ''inherently needed 1 ' to recognize various geometric 
figures* In doing so, we raise issues about the use of 
parallel computation, analogue devices, and other pattern 
recognition techniques. This section briefly reviews the 
setting given in [1] for such a study and provides an intro¬ 
duction to the remainder of the paper. 

By a r etina , H , we mean a collection of points, and 
by a figure on the- retina some subset X c F „ The else of 
t he retina. , | R| is the number of points in R T In studying 
pattern recognition we usually Imagine R to be a finite 
set whose points are regarded as the squares in some two- 
d .i me r: s i ona 1 p 1 ar. 3 gri d and ''arbitr ary geomc trie f i gu re s' 1 as 
approximated by some collection of squares* (Figure 1-1,) 








Figure 1 

Geometric figures on a grid 


A predicate on R is a function | defined for 
figures X on E which Can assume only the values 0 
1. Examples of geometric predicates are; 


P X is a square] 

[ X is convex] 

P X cpcLai-c more then £7 points] 


and 
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(Here, as in. [1J we use the notation 

Psame conditior!} 

to meaxi the value which is 1 if the condition is true and 0 
if the condition is false*} 

In computational geometry we arc Interested in synthe¬ 
sizing ’’complex" predicates out of "simpler' 1 ones. One 
measure of the sjmplicity of a predicate is its order * A 
predicate q> is said to be of order k if ^ makes 
its decision by examining at most k points of R , i*e. a 
if there exists a. set S of k points such that 

<p(X) = tp(X n S) for all X c R . 

r" 

If t = [qp^j , * * } is a Collection of predicates, 
then a perceptron based on £ is another predicate £ which 
is of the form 

$(*> - Pi ^(x; > & ] . 

i 

where a^, a.,, . „ , a^, Fi are real, numbers. 

In other words, a perccptron is the result of s linear 
threshold decision applied tb a ‘-.'eightad sum of other predi¬ 
cates* The a^ are thu weights and e is 'the threshold. 

The- ordor of the perccptron t. 


is the maximum order of 
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any of the predicates in the collection G . Notice that a 
perception of order 1 1 e precisely what is usually called a 
line ar Lhres hQI d f unct''on on R , 

In [1] Minsky and Fapnrt consider questions such as 
"What order perceptrons are necessary in order to compute 
various geometric predicates?" They show* for example, that 

p K is locally convex^ can be computed with a 

perceptron of order 3 

and ]X is a [discrete approximation to &) circle"*] can 

be computed with order -4. 

More interesting are the results which illustrate 
fundamental limitations of perceptrons. One can ask if’ a 
predicate is of finite order , i.e. ? if it can be computed by 
a perception of some fixed order., regardless of the size of 
the retina, (See $1,6 of [1] for a formal definition.) 

Minsky and Fapert show that such predicates as 

I” X is conneeted™| 

X has at least 3 components^"] 

am not of finite order. lndeed > a main theorem of [1] states 
Lhat the only topologically invariant predicates which can 
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be computed in finite order are those which are functions of 
the Euler characteristic (see [1] §5,9). 

Another kind of simple machine, the Gamba Psrcepfrnn, 
is described in [1] a kind of ''perceptron" in which each 

of the "simple predicates" q> is itself a linear threshold 
function: 

= r t b ijx-j( x ) > #|] 

x j €h 

* (x> = ri»i ri ^* *i ■ 

1 x,«E 

J 

Here x.(x) denotes the order 1 predicate 

■J 

x j ( x) * rv*! . 

Viewed as a psi-ceptron, the Gamba machine g has order 
equal to the sl^e of the retina j R\ , since each tp. looks 
at the entire retina. Hence the order restriction techniques 
of [1] do not give much information about the capabilities of 
this kind of device► 

From another point of view, however, the Gamba machine 
is nowhere nearly as complex as the general order - | r| 
perception* Rather* it is a, simple "two-layer 11 device. In 
which each layer Js made up of linear threshold elements, 

More generally, or.c could consider "citulL flayer machines", in 
which each layer make? linear threshold decisions based on 




results of previous layers. 

This paper deals with properties of these "multilayer" 
machines. The computational devices we will be concerned 
with are called Linear Threshold Machines . A linear threshold 
machine is a general purpose computer together with a number 
of linear threshold elements qp-^, • The general purpose 

computer is allowed to perform any computation whatsoever, with 
one restriction - computations cannot be based upon 'direct 
observation” of the retina itself, but rather upon the outputs 
of the threshold functions cp^» ...,cp r ♦ (Figure 2) The 
rank of the linear threshold machine is defined to be the number 
of linear threshold functions cp-^, • 



Figure 2 


Linear threshold machine of rank r 
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This class of machines Includes the Gamba perceptron, 
the multilayer machines, and in fact any hind of pattern 
recognition device that can he constructed out of linear 
threshold, elements so long as the arrangement of interconnections 
docs not include any loops. (Permitting loops would, allow one 
to build a universal computer out of linear threshold elements *) 
Ue begin, in Section 3, with a formal definition of 
linear threshold machines. Then in Section 3 we show that 
the parity predicate 



is not of finite rank. This allows us, in Section 4, to apply 
techniques of [1) to deduce that, as is the case with finite 
order perceptrons, the only topologically invariant predicates 
which could be of finite rank are functions of the Euler 
charac teristlc. 

In Section 5.- we begin to consider the problems of 
"infinite" Or "arbitrarily large" retinas. We Introduce the 
notion of uni form linear thresho ld machine , a linear threshold 
machine of fixed rank which can make computations which are 
"independent 1 ' of the size of the retina* Section & gives some 
examples of predicates which can be computed, somewhat 
surprisingly, by uniform linear threshold machines of rank 2 . 

Section 7 deals with the Saturation Theorem, our main 
technique for obtaining restrictions on the possible computations 
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of uniform linear threshold machines* Section 8 applies this 
to show that, as opposed to the finite order perceptrons* which 
can compute the Euler characteristic* the uniform linear 
threshold machine cannot compute any non-trivial topological 
invariants. Section 3 gives further applications of the 
saturation technique and demonstrates the inability of these 
machines to recognise figures in context. Section 10 returns 
to give a more careful version of the Saturation Theorem, and 
shows, for example* that if a linear threshold machine with 
bounded coefficients‘is to escape the saturation phenomenon* 
its rank must grow with the size of the retina* albeit very 
slowly (as log log ] RJ ). 


P. Linear Threshold Machines 

Definition £.1 A linear threshold function cp on a retina 
E is a particular kind of predicate computed as follows! 
For some real-valued function u on R and a real number 
§ we have 



x*=X 


Here u is called the measure and B the tbreshoJ d associated 
to ^ * 
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Hqw we combine these functions into machines* First 
of all, a. boolean r - tuple is defined to be an r - tuple 
each of whose elements is 0 or 1. 

A rank r decision function & is a function defined 
on Boolean r - tuples and which can assume the values 0 or 1* 
Finally* 

Definition £*2 A linear threshold machine of rank r 


M = M 


is a predicate consisting of 

(i) An r - tuple of linear threshold functions 
6 k ...,s r ) 

(±i} A rank r decision function A such that 


and 


M(X} =* MqpiOQj^X}, , 


This is the c.lftaa of Machines with which we will be 
concerned in this paper* The following observation*; are 
clearly true. 

1. If the retina has [ft| point#, then any predicate 
on R can be computed by a linear threshold machine of rank 
(R| . 

?. If M are linear threshold machines of 

rank respectively, then any Boolean function of 
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the can be computed by a linear threshold machine or rank 

r l + r s + • ■ ■ + r k • 

The definition (2.1) of linear threshold function Is 
slightly out of line with that used by Kinsky and Faport* 

When computing ij(X) we only take the summation over the 
points of X rather than over the entire retina It - Two 
alternative definitions we might ha.ve used are 

Definition 2.3 "'Order 1 Perceptron 1f 

,p(X) = |~£ a^pfxjX) > if| 
x€R 

where p(XjX) is a predicate depending only 
on whether or not x £ X . 

Alternatively^ 

Definition g + ^ Pl (-ljl) threshold function 11 

^ a^pfXjX} > o] 

x€E 

where p(x ± X) = 1 if /. £ x and -1 if 
x £ X * 

It is easy to sec that all three of the definitions arc equi¬ 
valent so long as we are dealing with a fired finite retina 
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it , If, however, we consider Infinite retinas ar sequences 
of retinas, the different forms (2*3), (2.1i) make a 

difference. Pox example?, the predicate 



is easily expressed as a (-1,1) threshold function 



xG R 


but a type (2*1) threshold function for this same predicate 
must involve constants which grow large as the size of the 
retina becomes large. We have chosen to work with the form 
(2.1) since we wish to make computations which depend only on 
the figure X itself, and not, explicitly on the retina R , 

Finally, there is one more assumption we will make 
about the threshold functions - that of finite sensitivity , 
that the values of the measure cannot he arbitrarily small in 
absolute value: 

2*5 Hypothesis of finite sensitivity : With each threshold 
function there is associated a sensitivity e such that, for 
any x 6 R either u(x) - 0 or |u(x) >_ e| . 

This hypothesis will certainly he satisfied for any 
linear threshold function built out of actual physical 
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component b* for example, out of optical filters and photo 
detectors. 


3, Parity 

Ue will be concerned, as in [ljj with predicates that 
can be computed by linear threshold machines which are 
"independent of the retina”. Our first attempt at formalising 
this concept is the notion of "finite rank"* 

Definition 3,1 A predicate $ is of finite rank r if for 
any size retina B there is a. linear threshold machine of 
rank r which computes f on B , 


In this section ue exhibit a predicate which is not 
of finite rank. This is the '"parity predicate 11 




contains an odd number of points of 


We shall show that, for a linear threshold machine to be able 
to recognize parity, its rank must grow at 1ogarith™ica1ly 

with the size of the retina. More precisely 

3.P Par i ty theorem . Suppose M la a linear threshold machine 
of rank r which computes parity On s retina U . Then 
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j nj < YZ *" 1 . 

Before proceeding with the proof, we first introduce 
some notation. Lot = &i 

i» ■ »qp r ) 

Y ^(x) X e ± ] . 

x£X 

s i (x) = Y 

x£X 

and let a ± £x) = 13 ± {X) >_ o] . 

Finally, let $(X) be the Boolean r-tuplc 

*(X) - 

ana let E(X) be the Boolean r-tuple 

m) = {*!&)*a 2 m,*.',e r m) . 

Mow recall the usual Boolean notion of ’'Implication'", 
i.c., 0^1 , l^l , o-*0 are all valid, but 1 -* 0 is 

not valid. This extends to a partial order on Boolean r ’tuples: 


i ** 


Wjl (X ) = 

For any X c ft let 
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3*3 hefin.lt ion . If a and b are Boolean r - tuples, then 
we say that ei < b if s ± — b ± for 1 - 1 3 , (1} r . 

Our first step In proving Theorem 3.2 ia to show that 
arty linear threshold machine can be put in M normal form”: 

3-^ Pefinitlor . A linear threshold machine M - is said 

to be normal If each component linear threshold function 
evaluates to zero on the empty set, i.e. 


*( 0 ) = ( 0,0 


0 } . 


This is equivalent to saying that each of the thresholds 
B^ is positive♦ 

3*5 Normal iz at ion Lemma ■ If >5 ia a linear threshold machine 
of rank r then there Is a normal linear threshold machine. 


also of rank r , which computes the same predicate as M 


Proof : Suppose, by reordering, that .^(0} = 1 for 1 = l t + *.jk 
and 0 ^( 0 ) * 0 for k = k+,1 , We will produce a. new 

linear threshold machine by modifying the first k threshold 


Namely, if 



functions. 
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a linear threshold Function, define a new threshold function 
~p ± by 


= ry (^iC*)) >. -eji 

xCK 

so that ^(X) = 1 if L and. only if qp. (X] = 0 * So now let 
M 1 be the linear threshold machine 


M * s (l - $ , i - <|j 


2 * 


1 - h 


ktl 


+ 


The key observation in the proof of theorem 3-2 is the 
following, "regularity condition 11 for linear threshold machines: 

Lemma .f: » Suppose M ■= a§ is a normal linear threshold 
machine on <i . Suppose X and Y are disjoint subsets of 
R with fi(x} < E(Y) . Then 
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*(x) i *(x g y) < E(y) , 

Pro af t Ue Ml verify that ^(X) -► ^(X Lf Y ) -*■ ct^Y ) for 
each i „ 

Suppose cn(Y) » 0 .1 so that 5^{Y) <_ Q , Then, by 
hypothesis, ip^(X) must also be 0 * so that S^{X) _< . 

Therefore, since X and Y are disjoint, we have 
S^(X u Y) = S ± [X) + S ± (Y) < & ± , that is, ^(X u Y} = 0 + 
Hence flp^(X U Y) = 0 whenever g^(Y) * 0 , l.e, 

#(X U Y) < t(Y) . 

How suppose that sp^fX ij Y) = 0 , i + e T , 

B ± W + S ± (Y) < 6^ , C3*5} 


Then 


case 1 If Eih(Y] <_ 0 we have ct^[Y) = 0 , by normality 
So the hypothesis implies that qp^(x) must be 0 

case 2 If 5^(Y) > 0 then equation 3-5 implies that 
S ± (X) < 0 1 , i.e., <^(X) = 0 . 

So, Ir. cither case, (jo{X) * 0 whenever $j^Tx U Y) *j 0 , 
that is, *(X) < *[X ii Y] . 
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The next lemma applies these ’'regularity 61 considerations 
to the parity predicate * First, if v is any Boolean 
r - tuple define ones(v) to be the number of ones in v , 


Lcma 7j .7 . Suppose M is a normal linear threshold machine 
which computes parity on a retina R , and suppose that 
x, ,3c*, .. .jix are distinct points of R with 

JL t SIS 


E(x 1 , 1 <_ T,(k £ ) ♦ 

Then ones(E(x Ri ]} >. m ♦ 


Proof: Define subsets of R in be 


v = it 1 ij x„ u , , . ij x i 


v =. 0 

o 


t : is normal we hftvn ifv Q ) <_ S:[x^} <_ !TO 


5 3 nee 
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Lemma 3>6 applies to give 

*fV 0 ) £ <(x 1 U V Q ) £ E(x 1 ) 
that is, ${v Q ) £ §fv 1 ) £ S{ x-j ) * 

This inequality, along with the hypothesis, now implies that 
#(E 1 ) £ so once again we can apply Lemma 3*C to obtain 

*{\) < H*? U V x ) < E(^) 

or, 5(V l ) < Vy 2 ) £ L(x a ) . 

Continuing in this manner, we get 

*<V £ *ty s ) < ... < £ T(x m ) . 

But v i and V |_-l have opposite parity so / i(V ) - 

Therefore the vector $f v ^) contains at least one more ’'one 11 

p 

than the vector end so H(JE n l >. !'V m ) must contain 

at least m ones* 

Corollary 3-5 , Suppose K is a linear threshold machine 
which computes parity on a retina E . Suppose that all the 
measure functions o i in the threshold functions for M take 
on only positive values* Then. 
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rank w >, |HJ 

Proof : The hypothesis implies that £[X) = (1,1,.. .,l) for 

every subset of R ♦ Therefore ve have 


= 2(k 2 ) 


" E ^ x ]R[ ^ 


and so Lemma 3*7 implies that 


one&( E(K| R j )) >_ | ft| 


But oue»[£{X| f^| )) <_ number of elements in £(3t| R | ) = rank H . 


The same kind of reasoning as in Corollary 3.S provides 
the proof of the Parity Theorems 


Proo f of Theorem, j._g : 

Suppose M computes parity and has rank r . Let 

B(r) ^ ^ ones(v) 

v 

vfhere the sum is taken over all distinct boolean r-tuples 
v . Then vo claim that | fij , the siz? of the retina, must 
ho loss than or equal to 3 J r) For, ouuFjider the r-tupl*^ 

7!{x ^} as x i runs through the elements of F . If | P| > B(r) 
then there must he semo r-tuple v and points . ♦,, with 
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E(X 1 ) « " 




V 


and k > ones(v) . But this Is impossible by bejjna, 3.7. 
Thus | R| < T3[r) . 

Finally, it remains only to compute E(r) : 



the number of r-tuples 
v vj! th oneetv) = k 



k-0 


4, Topological Consequences of the Parity Theorem* 

This section follows Minsky anti Papert [[1] Chapter 5 } 
very closely In deriving consequences of the fact that the 
parity premiep.te is not of finite rank* We deduce that, 
as parity is not of finite rank, then neither are such predi¬ 
cates as 


r* "> 

|x has two components, one surrounding. 



the other 


9 


and so on + vie will show that the only topological predicates 
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which could be of finite rank can depend only on the Euler 
characteristic of X . (In fact, in Section 3* we will show 
that even these “Euler predicates" can not be of finite rank 
if wh impose certain h uniformity conditions'" on our linear 
threshold machines*) 

Following Minsky and Papert, we show that any scheme 
for computing topological invariants (besides Euler 
Characteristic) on a class of figures [XJ must also be able 
to compute parity on a class of ^derived figures'" {X} . 
hence, any machine which is 1,1 confused" by parity must necessarily 
also be confused by topological invariants. 

This notion of "predicates on derived figures-' is made 
precise by Minsky and Fapert in Section 5^ of (l]i 

Suppose E is a function which associates to any 
figure X in R a figure X =. F(X) in ft . Let f be a 
predicate on R , Then we can define a predicate t on R 
by 

♦ O) = i{P(x)) - i(x) . 

In thin context, Minsky and Papert formulate 
Collapsing Theorem for FereepIrons ([ J ], Theorem 3.4.1 ); 

■ a 

Suppose the function F is such that, each point X of R 

A 

d epe n ds on at most one point of li , i r e, , the points of R 
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f&11 into four categories: 

x € X for all X 

or x f. X for all X 

or there la a point x £ E such that 

i f X iff KU 

or x € X iff x f X . 

Then order ^ < order j . (That is, if i can be computed 

by a perceptron of order k j than so can i ,) 

Analogously, we have 

Thoor&Ht 4.1 (Collapsing Tseomia for Linear Threshold Machines] 

Suppose that, as abovOj each point x of E depends 
on at most one point of R . Than 

A 

rank f <_ rank g , 

a 

That is, if can he computed by a linear threshold machine 
of rank r then so can £ , 



Proof: Suppose M M is a linear threshold machine of 


rank r which computes $ on ft . Let , i = 1,..>,r be 

the linear threshold functions which comprise I . Let 
cp L be the predicate on ft defined by S^{X) -* cp^(F(X)) , 
Recall (2+3) that as long as we are dealing with a fixed 
retina (such as R or R ) then "linear threshold functions" 
are the seme as "order 1 perceptions 1 '. Thus we can apply the 
Collapsing Theorem for Percept rone to deduce that the qs^ 

Can be computed as linear threshold functions on H . Now 
define the linear threshold machine M on R by 

H = i§ 

where & ■ A 

and $ — + + y j ,i + 

Then, for any X c R 

K{X) =. 4((X) = - M(F(X)) - i(F(X)) = +(X) 

so that M is a linear threshold machine of rank r which 
computes g „ 


Corollary h S -' . The predicate 





q f x i <== 

*connected,' ' 


15 connec 



is not of finite rank. 

Proof: Since we have 

1- ^pprity ls not of rank 

2* the Collapsing Theorem 1 e true 

the proof is identical to the one- given in the context of 
finite order perceptrons in Sections 5 ,5-5*7 of [1], Basically, 
the Idea. Is to construct a function 


Ft (figures in H) 


(figures in ft) 


such that 


^parlty^ Xj ^conrvected^^ ^ 


and the Collapsing Theorem implies that 


ra ' 11 ' Wlty - rar * 5 


See [1J for detail 


The techniques of [1] also allow he to deduce that the 
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only topologically invariant predicates which could be of 
finite rank must be functions of the Euler characteristic. 

Definition 4.1 . A predicate ■} 1e said to be topologically 
invariant if + (X) = ^(Y) whenever X .and Y are topolo¬ 
gically equivalent (i.e.* X and Y can be "continuously 
deformed 11 into one another). 

Corol 1 ary 4,^ . Let be a topologically invariant predicate 
of finite rank. Suppose X and Y are figures with the 
same Euler characteristic. Then + (X) “ ) ■ 

Proof: The proof exactly follows Theorem 5*9 of [1] which 

proves the corresponding result for finite order perceptions. 
The idea is based on a construction due to Paterson which 
reduces the computation of ty modulo Euler characteristic 
to the computation of the parity of certain derived figures* 
flee [1] for details. 


5 . Infinite Retinas; Uniform Linear Threshold Machines. 

In demonstrating that predicates such a* parity and 
connectedness are not of finite rank, we coroidered a ilxed, 
finite retina and found lower bounds for the rank of any 


linear threehold machine which computes these predicates. The 
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lower bound becomes large as the size of the retina becomes 
large, hence the predicates are not of finite rank. 

But the intuitive concept of "finite rank" carries 
a somewhat stronger connotation* Namely# we would like to 
think of a "finite rank 11 predicate as or.e which can be some- 
what computed by a fixed linear threshold machine which works 
regardless of the size of the retina* We formalise thla notion 
below In the definition Of Uniform linear threshold machine. 

Definition 5*I . By an 11 infinite retina 11 ft we will mean 
an increasing union of retinas 

R ] c R^ c R J 

A unifonti linear threshold function ip on ll is a 
compatible collection of linear threshold functions 

cp 1 ^) ^ FI *~W > ^ 

JK£?t 

i i 

where u' is a measure function on R , By "compatible 

collection" wc mean 

1) If R 1 - c: R"*" than restricted to is t-he 

same as # 

2) all the 9 1 are the samp . 

Thus# It makes good sense* for finite figures X in 
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R , to write 


u(x) > a 

where u is a. well-defined function on the infinite retina 

A. 

R . 

D ef1nlt1 1 on 5, ? 4 A uniform linear threshold machine of rank r 
on R is □ predicate M = where $ = -*.iq3 r ) is 

an r-tupie of uniform linear threshold functions and £ 
is a rank r decision function, 

intuitively,, then* we allow our machines to operate on 
larger and larger retinas by hooking up more and more imputs 
to the linear threshold functions. The thresholds 9 as well 
as the decision function A remain unchanged, 

Notice that we make no requirement that the measure 
functions u remain bounded as the retina gets large, 

Also we could have defined our "uniform threshold 
functions" based on one of the other definitions of "linear 
threshold function", £-3 or 2 + ^* These would lead to different 
classes of machines+ However, Definition 5-1 seems more natural 
since, for a fixed figure on an "arbitrarily large" retina, 
the threshold summations need extend only over the points of 
the flgura. This seems to capture the intuitive notion of 
"computations which depend only on the figure itself, not on 


the entire infinite retina". 






6, Stratifie&tionj Predicates of Rank 2, 


Much of this paper is concerned with proving that, 
v&rious geometric predicates are not of finite rank* £n this 
section, by way of contrast, we show how certain ir symmetry" 
predicates can be computed by uniform linear threshold 
machines of rank 2 - These results are reminiscent of the 
'"stratification phenomenon 111 discussed in Chapter 7 of [1], 

This consists, roughly, in using very large coefficients to 
encode geometric information, thus allowing certain predicates 
to be computed by simpler machines than might have been thought 
necessary. The details of this technique for linear threshold 
machines differ from those given in [1] for perceptrons* 
However, the results have the same flavor in both cases, and 
so we retain the name "stratification". 

Theorem 6_, 1 {Rank 2 Stratification). 

Let SijS,, j + .. be a sequence of disjoint finite subsets 

A 

of P . Let be the predicate 

li^x) = [either s jL c X or S i A X m 0 

then t = can be computed by a uniform linear threshold 

machine of rank p* 

(Note: Each ^ must itself be a finite set. But there may 

be in finitely many' distinct S.'s,) 
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Proof: For each E., pick a "b&se point 1 ’ b, € S + + Let 
““-“ i i JL 

n.^ = (number of element ft in - 1 * Define the function 
p. aa follows; 


for x ^ S ] - 


u(b, ) = -rum 


11 


vW 


and d inductively t 


for x £ - b^ 


U 


[b. ) »■ -n,m 


ii 


Ia(k) = 14- Y abs(u(y)) * df jeu 


i-1 


ye 

j=i J 


and li{x) = 0 .for J£ not contained in any . 
Then define 


C'-l {X ) = 


/ u(x) >_ O 
x£X 


&>(x) = n N*])! 0 

x€X 


Ws claim that y(X) 5e true if and only if cp-,(^) 
and c^(X) are both true* i.e fJ . if and only if 


y u( x j 


C 
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To see this, note first that 

y ^(x) = o 

xes i 

by choice of u „ Mow if i^(X) is true we have that either 
X H or X 0 5^ - 0 , In either case, then 

y u(x) = 0 * 

xes ± n x 

So if t(X) is true, we have 

Y u(x) = x Z = 0 + 

xfX i x£3 ± n X 

Conversely, suppose v(X) is false and let I be the largest 
value of i for which is false, (Recall that we are 

only concerned with finite figures X . so that I exist:-;* ) 
Then 

l »m 

xf_X 


Let h 1 ■== ^ u{x) 


LJ I ' 


V* 

L^ 

all i 
such that 
t ± (X 5 is 

fels e 


I 

x^x n s. 

i 


uU) 


and A 


R 


I 

x.£ X 

, y — r-, v 


xf-;-: _ n x 


u(x} , Then 
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y ^(x) = a 


+ A r * By construction of u * we have 


x^X 


ats^Aj) >_ nij since at least on* point of Sj it not in X . 
Also* Toy construction* > abs(A fi ) since* if ^(X) is 

false then i < 1 ♦ 

Thus [Ajl > | A | so Aj + A r / O . 

Finally* notice that this construction will provide 
uniform, linear threshold functions qp** qpu on a retina 

i p 

sequence R: ft c R c *., We need only make sure that the 
11 higher numbered' 1 sets appear in the higher numbered 

retinas R J + The crucial point is that we can, ’'enlarge the 
retina ", add more * without changing the value of u on 
the lower numbered , 

6.i? Examples , 

The following predicates all have rank 2: 

(a) Draw a vertical line 1/ down the center of the 
retina. Define i by 



$(X) jc X is symmetric with respect to L 


have two elements consisting, of a point x 


Here the set 


along wit.ii Its reflection in L „ Following through the proof of 6,1 


we see that the weight - 
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{b) More generally* let t> be a finite group acting 
on H , Then 


v.on - f 


X Is Invariant under 0 


1 


has rank S + Take- the £ 1 to be the orbits of points of R 
under the C-action, l*e,, 


£ ± = Lj e< jc > 

g€G 


for some x € B. 


(c) Pick a point x £ R * Then 


w = r 


X is a bull’s-eye centered about x 


ol 


has rank 2. Take the to be ''concentric rings” about 


T- Saturation, 

We now turn to some predicates which cannot be computed 
by uniform linear threshold machines. These Include, for 
example, predicates which recogni 2 c any topological Invariant 
and predicates which recognize figures in context. 

The main technique for obtaining these results la the 
Saturation Theorem. This says* roughly* that linear threshold 
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functions will become "overloaded” as the retina, becomes large 
Can&equentlyj parts of figures may become "invisible" to a 
linear threshold machine. We formalize this in the notion of 
n saturation 1 " and "saturation sequence”. 

Definition 7.1 Suppose that M is a uniform linear threshold 

1 2 

machine on a retina R: R c R c ,„„ and that A and 3 
are subsets of fi with A c R a , B c R* 3 t a b * Then we 
say that B saturates fl with respect to A on E 1 * , if 
for. any B c R we have 


H(B if S) = MfA \l B U £) - 


(See Figure 3) 

Intuitivelyj the idea is that B "overwhelms" the 

* >i 

decision elements of M to such an extent that M cannot 
"see" A . 
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Figure 3 


B saturates M with respect to Aon R a 

Definition T.g Suppose is a uniform linear threshold 
machine on fi and that [A ± c is a sequence of subsets 

of E , (Here represents an expanding, collection 

of retinas ir. H , ) Then we say that (A. } is a saturation 
sequence if there exists an integer N such that 


Ap U {} * . „ U A n 


saturates f= with respect to A, on R 


4(1) 


The r;ain result about saturation is now: 
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Z-±3 Satu ra tion Tbnoreir. ]jet 
machine On .ft and let c 

of disjoint seta. Than [A^] 
is a saturation sequence. 


M be i uniform linear threshold 
be any infinite sequence 
contains a subsequence which 


and : = (tp] * . ■ ^<p r ) 


j S' ^ r, ^ j ( x ) > ,i 

xex 


S J« X > • l u,(x> 


xex 


Proof: Let M 


As in §3 let 


Define the number 


v t (Xj by 



if Sjfx] « 0 
if Ej(x) > 0 
if Sj(x} < a 


and let F(X) he the r’tuple 


r(x) . (Y 1 (x),v p (x),,.. (V;r (JO) 


Since there are only a finite number of possible values for 
r(X) f there must be an infinite subsequence of the {} for 
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ubich r(A^) takes the same fixed value- We claim that this 
is the desired saturation sequence,, 

To prove this* first renumber the A so that 

A jA f -.. is the subsequence picked out above* Also for 

J d 

convenience renumber the the R^'s so that A H t R 1 t Mow 
let 


M 


max 

J=lj + i * j P 


max abE(S.(X)) 
i J 

X C JT 


Let 


T 


max 


*1-1, * 


abs 0. 


and choose N > ™ +■ 1 

where s is the minimum sensitivity of the linear threshold 
functions tp^,.***^ - (Recall §2.5- 1 

Let A = and let B = A 2 U A^ \[ .I) A fJ and let 
S be any subset Of R^" ■ We will Show tha.t 


?>j(A U B ll E) = q-j.[L M £} 

for j = 1,2,*..r * This will prove the theorem. 


Case 1 * Suppose ue have a J for which £j (A) *= 0 . Nencu 
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£^{B) and S^(A u B) are also 0 , so 

SjU U 3 U £) -■= Sj(S) - Sj (li y S) 

and therefore f,(A y B V S] » J.fB U B] . 

J J 

gasa "- ■ Suppose Sj(A) > O . Then, by (2.$) we have < e 

and so 

Sj(B) (H ‘ l)fi = T + M 

also, by choice of ft we have 

[Sj( s )f <_ M 60 £^(3 jj S) >_ T > 9 

and Sj(A u 3 ij £) > T +• e > Gj , 

Hence $.*(A tf B U S) - ^(S u S) - 1 . 

Sage 3 , Suppose Sj(A) < 0 , Then, as above, we have 

Sj{A) <. -e arid Sj(B><-(M-lJe=-“T-M. 

A1 so [S.{£}[ 1 M , so 3, (3 I j S) < -T < 9 

'J qj J" 

S,(A U 3 U 3) £ -T - e < 6. . 


and 
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Hence ^(A U B U B) = ^(B U S) - O 

This completes the proof. 

To use the Saturation Theorem we proceed as follows: 
first find a figure X which we would like to make "invisible’ 1 
to K , then embed X in a saturation sequence so that 

X 2 U X 3 U .. . U X N = df sat(X) 

i- 

saturates M with respect to X . The following proposition 
illustrates the technique; 

Lot * j- be the predicate 
ad j 

= ]""X contains at least ? adjacent points 

of R\ 


Proposition. 7_.j+ 

W x) 


T!l ™ tdj 

machine * 


cannot be computed by a uniform linear threshold 


Proof: Suppose M is a uniform linear threshold machine. 

Let [A^] be a sequence of single points of H , spaced at 
least 3 apart * By Theorem 7*3 [>1] contains a saturation 

subsequence ... Then, ns indicated above, let 

B == ci sr and let 1 1 " ... U B.^ •* 


sat(B) saturate 




9t with respect to B on R 1 . Mow let E be the figure 
consisting of a single point of R 1 * adjacent to B but 
not adjacent to any of the other B^ „ Then* by saturation* 

^(sat(D) u S) ^ fl(B U ut(S) U S) 

but '^dj^ S£Lt ^ B) u s ) false while * adJ (B \j sat(B) y s) 

i6 true. (See Figure 4j 



Figure A 

Saturation sequence fur 



Regierk : This proposition stands in sharp contrast to the 

perception c M e, where is easily computed by a pereeptron 

of order S. 
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9 * Topological Invariants, 

We have already teen (4.4) that the only topological 
invariants which could he computed by a finite rant linear 
threshold machine are those which depend only on the Euler 
characteristic, Now we apply the Saturation Theorem to 
conclude that not even these "Euler predicates' 1 arc computable 
in a uniform way* end that consequently uniform linear threshold 
machines cannot compute any non-trivial topological Invariant* 

Theorem, . 1 Suppose ti is a uniform, linear threshold machine 
such that M{X 1 ) = M(X 2 ) whenever and X^ are topologicall’- 

equivalent* Then, in fact* M(X) = M[Y) for any non-empty 
sets x and Y . 

proof; Let T denote the annulus illustrated in Figure 5, 

Let X he any non-empty figure. We will show that H(X} = K(T) * 



Figure 5 


The standard Annul us, T 

















Step li Let: e(Xj be the Euler characteristic of X . 235,,' 

applying ^.4 and choosing the retina large enough 
we have that M(X) is equal to one of the "canonical figures 
with Euler characteristic cfX)"^ i.e. 

e(X) disjoint squares if e(X} > 0 


or 


a 1 - e(X)■ holed annulus if §(X) < o 

(sec Figure 6). Thus we need only show that ft(x) = M(T) 

for X equal to any of these canonical forms. 


«(*}> 0 






i ll F M < ( k + ■ ■ r ■ t ii 



etn J £ 0 



Figure @ 


Canonical figures for Euler characteristic 


























Step 2, 


In the retina sequence ft 
choose a sequence of die joint copies of T . Nov? use the 
Saturation Theorem 7-3 to find a sequence T r T ?.’ '"' JT N GO 
that 


T p U **. U T h = sat[T 1 ) 

saturates M with respect to on fl 1 , {See Figure t .) 
Notice, that by (it.il) we have H(X) = H(X J sat^)) since 
these sets have the same Euler characteristic. 



flil R3 




T rt 


Figure 7 


A saturation sequence of annuli 











Step j; - t?8se l! Su PP=>^ e ( x ) < o so that the canonical 
form for X is an whaled annulus. Consider the set 

X U . By topologies Invariance we can defer* 

X u Without changing the value of £ So that the 

end-position hole" of X moves over to become ^ {Plg ur e 

1 * R -' * “ * U T l where 3c has (n-I) holes. 



X 


X 


FiflUrO 8 


X deforms to XuTj 















Thus we have 


M(X) = M{X U sat^)) 


M(X U Tj II sat(T )) 


- M(X U aat^)} = W{E) * 


Proceeding induetivelyj we can reduce the number of holes of 
X one by one until there is only one hole left* i.e., X 
reduces to an annulus. 

Case 2: Suppose that e(X) > 0 so that the canonical form 
for X is n disjoint squares. Let denote the ’'end 

most" square and X == X u . Consider again the set 
X ti saifT^) * Once again the value of' M is unchanged if we 
deform this set by moving & r over to be adjacent to the 
position occupied on the retina, by T^ . (See Figure 9 .) 



F igura E 

XUTi d«forins to J(uS n uT^ 
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Thus 


M{X) - M(X U sat^)] = U U satf^)) 

= m[3T u s n u t l >j satfTj)) 

where the last equality uses saturation to add in T'^ . Eut 
if s n Is directly adjacent to then the set s fi y Tj is 

itself topologically an annulus and so has Euler characteristic 
sem. Thus 


li(X U 8„ U Tj U SBtlTj)) - fi[Y) ( as long as X / O ) . 

Proceeding in this way, vre c&u eliminate the squares of X 
one hy one* 

This completes the proof* 


9« Figures in Context. 

We recall the following definition from §6*6 of [1], 


Def in.ttlar 9*1 If 4 is a predicate then define a new pre 
' 11caL ' ? ha context by 
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if 


in context 


(X) 


t(Y) for some connected 
component Y of X 


Papert and Minsky show that, tot such predicates as 

X Is » hollow squarej cannot, be computed by 

a finite order perceptron* In this section ve show that 
uniform linear threshold machines can compute |i. t . f ..,- ,-, x + 
for only the most trivial kind of predicate t , 


Definition 9,2 We say that a predicate g is d ivisible; if 
t? satisfies the following condition: for every connected 
set X on which f is true, if we divide X into two 
disjoint connected sets X ^ A !J B , then ^(A) Is true or 
(B} is t.rue, 

'We can see th&t most ''interesting 11 geometric predicates 
are not divisible* for example, if -#(X) is true and $ 
is divisible, then, by continual subdividing we see that ($ 
must be true on the set consisting of one single square of X . 
Consequently, any predicate which is both divisible and translation 
invariant must be true on the figure consisting of a single 
square* Kot all predicates which &re true on single squares 
are divisible. Figure 10 , for example, illustrates that 
X is a square] is hot divisible. 
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0 



Figure 10 

X ts a square' is not divisible 


Theorem 9.3 Suppose V. is a uniform linear threshold machine 
which computes iji.^ context for some translation invariant 
predicate t , Then | most be divisible, 

Prsof : Suppose 0 Is not divisible. Then there exist 

connected figures A and 3 such that X = A u B is connected, 
ij (A ) =■ l}(B) 0 and $(X) = 1 + Choose a saturation sequence 

Of sets which are all congruent to B and translate A 
so that A U t? 1 is congruent to X r (See Figure 11.) 
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Ffgure ft 

Saturation sequence for ^ jn conteKt 


Then we have f(B^) = 0 for i = 1__^ M . Lotting 

S„ ij ... U 3^ = sat(B^) we have 

♦in context ! 4 U sat < E l>> = 0 




*i» context * 4 u 8 l * sat < B i n = *m context** u sat < 3 i ;) * 1 


Cl: the other bond, W« must have 


H(A LF Qftt(B 1 )) - r-:(A U ^ ' SRttBj.]) 
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The "negation" of 


in context 


i& given by 


t n(X) = f |(Y) for all connected components Y of 


we leave it to the reader to formulate and prove the 
corresponding theorem for f ^ * e + g.., 


Tl(X) = [every component 


of X is a 



cannot be computed by a uniform linear threshold machine. 


10. Bounds for Saturation + 

We have shown that uniform linear threshold machines 
which purport to recognise even very simple predicates must 
eventually fp.il on arbitrarily large retinas. But how large 
is arbitrarily large? This section provides a bounds albeit 
a rather weah one, in terms of constants associated with the 
machines. 


Egfinltion 10.1 Lot M - be & uniform linear threshold 


machine of ro.nk r or. R : 
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£ - ( <P ] i «Pp 3 1 * ’ f fpj, ) 


i-W - 


l 

x$X 


Let S ± (X) = ^ Ul (x) , let S(X) » max |S i (X)| 

x£X t = l.j < + i. > r 


let 

T ■ max | 0. | j 

±-lf -. *#r 1 

and let e 

tivity £2,5) of the .J, , 

Wow define i 

Mi) 

hy 



m[0) = 

malt S (X ) 

Xr-it 1 


b{l) = 

m[0) + T 


£ * 

and. 

inductively. 



m(n) a 

tr.ax £ (X } 

X{=R ^(n) 


b(n+1) = 

b[n) + e f] 


Finally, let H(M) - b{3 r ) 


"* I 

Theorem 10. P Let M bo nc above and let lA^ tr H' L ) bo any 

sequence of -disjoint sets. Then there exist.., amori^ the first 
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N(M) terms of the sequence* a set A = some A^ c R n and 
ol set B = (union of Aj. ) such that B saturates $ with 
respect to A on R 1 " 1 _ 

In other words, for the purposes of applying the 
Saturation Theorem* R^^ Is ^arbitrarily large"* 


Brouf: If we examine the proof of the Saturation Theorem (7-3) 


we see that we Can ss.turate M if we can find some A^ C R 


n 


and N more A‘s A ( /,wA 




t * + • t A 


i(N) 


such that 


1} All the have the same vector f(A^) 


2} N > — —■ ? where M = max |S(X)J 

XcR n 

Since there are 3 r possible valu&s for r[A .} the theorem 

It 

will follow at once from the Lemma below* if we choose p = 3 
and f{x) « r(X) * 


Lg-ntma 10+3 Let f bo a function from subsets of R to the 
set of integers [!(?*■.,*?] - Then from any sequence of 
h(p) subsets of R we can extract sets A-c E n and 

A i[l)* A i(e)***** A i(N) vith 

1 } f(A) f(A 1 ^ l) l = ... = f(A 1(H j) 

2) N > —1—^ M * max ]S(X)| . 
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Proof : By Induction on p ♦ 

Firsts If p ■= 1 t then all A r s in the sequence 

> 1 

automatically have the sane value f{A) , so choose A •* A-^ c; R- 

and condition (?) Is fulfilled since b(l) > * T . 

Now we assume that the lemma is true for p and prove 

it for p + 1 . Suppose vre have a sequence of b(p l-l) 

elements, with f taking values In the set (l,£, , ltJ p+l] . 

Break the sequence into two pieces! 

the first b[p) elements 


and 


the remaining [m(p) t T) elements * 

If f applied to the first piece takes on at most p 
distinct values then the lema^a follows by induction * 

Otherwise f takes on all p f1 possible values among 
the first b[p) elements. Now among the second group of 
elements there Is some value which f assumes at least 
^ ^ times - But there is also some set A among the 
first b(p) elements on which f assumes this value. So let 

f IQ ^ 

the desired sequence he A cr S ” followed by the remaining 
n f id 1 h T 

1 ■ ~ elements selected from the second group. 


Ul 


Example 10 .h Suppose that the measure functions 


used by 
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pi bounded j, i.e*, < k for s € R . Suppose also 

that the size of the retinas R n grows linearly with n , 

|R n | m cn . (This is sufficient for all applications of the 
Saturation Theorem in Sections 8 and. 9*) Then we can estimate 
N(fl) : 


m(n] kc b(n) 

b(n+l) = b(n) + b(n) + T] 

or b(n+T} - b{n) = kc b(nj + T] 

and we can estimate the growth of b(n) by considering the 
differential equation 

d£ = <V* + ♦ 

dx JL 

This has the solution 

G x 3 /? C 

y - ac 2 e 

and so wc find 

log b(n) ^ o' 


Finally N (id) b(j r ) so we get 



r 


- 5 *™ 

log log N[M) ~ 

Corollary 10^- : For a bounded linear threshold machine to 
avoid being saturated on large retinas,, the rank must grow 
at least as fast as log log j R| * 


13. Conclusion. 

It is Instructive to compare the results of this paper 
with those of [1J* Minsky and Fapert demonstrated limitations 
of per cep troiis of small order, providing mathematical justi¬ 
fication for the intuition that these computational schemes 
are somehow too "local 11 to deal with such 11 global 1 ' predicates 
as connectivity. Here we have taken a Complementary point Of 
view* investigating the limitations of the linear threshold 
element itself as a decision element* 

Like Minsky and Papert we believe that the value of 
this work lies in the general phenomena that it illuminates 
rather than in the precise statements of the theorems* In 
our esse, we have shown that Minsky and Tapert’s "stratification 
phenomenon" appears in the class of linear threshold machines 
as well as in perceptrone. We have also indicated the importance 
of saturation ns a potential pitfall for any machine attempting 
to recognise patterns using only a small number of threshold 



elements. 


Hopefully, all of these results will someday he 
subsumed by a genera! mathematical theory of pattern recog¬ 
nition* a theory which will clarify the intuitive guess that 
any system for ''general purpose" pattern recognition must 
have the ability to 11 foe us in on 11 local features and also the 
ability to combine this local data in flexible "global" ways. 
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